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Abstract — While most useful information theoretic inequalities 
can be deduced from the basic properties of entropy or mutual 
information, Shannon's entropy power inequality (EPI) seems to 
be an exception: available information theoretic proofs of the EPI 
hinge on integral representations of differential entropy using 
either Fisher's information (FI) or minimum mean-square error 
(MMSE). In this paper, we first present a unified view of proofs 
via FI and MMSE, showing that they are essentially dual versions 
of the same proof, and then fill the gap by providing a new, simple 
proof of the EPI, which is solely based on the properties of mutual 
information and sidesteps both FI or MMSE representations. 

I. Introduction 

Shannon's entropy power inequality (EPI) gives a lower 
bound on the differential entropy of the sum of independent 
random variables X, Y with densities: 

exp(2h(X + Y)) > exp(2/i(X)) + exp(2h(Y)) (1) 

with equality if X and Y are Gaussian random variables. The 
differential entropy of the probability density function p(x) of 
X is defined as 

^ = E { l0g ^}< (2) 

where it is assumed throught this paper that all logarithms are 
natural. 

The EPI finds its application in proving converses of channel 
or source coding theorems. It was used by Shannon as early 
as his 1948 paper [1] to bound the capacity of non-Gaussian 
additive noise channels. Recently, it was used to determine the 
capacity region of the Gaussian MIMO broadcast channel [2]. 
The EPI also finds application in blind source separation and 
deconvolution (see, e.g., [3]) and is instrumental in proving a 
strong version of the central limit theorem with convergence 
in relative entropy [4]. 

Shannon's proof of the EPI [1] was incomplete in that he 
only checked that the necessary condition for a local minimum 
of h(X + Y) is satisfied. Available rigorous proofs of the EPI 
are in fact proofs of an alternative statement 

h(y/XX + yjl — \ Y) > Xh(X) + (1 - X)h(Y) (3) 

for any < A < 1, which amounts to the concavity of the 
entropy under the "variance preserving" transformation [5]: 

(X,Y)i — >W = V\X + Vl - \Y. (4) 



To see that (01 is equivalent to ([T), define U, V by the relations 
X = VA U,Y= Vl - A V, and rewrite (Q) as follows: 

e 2h(V\u+VT=Xv) > \ e 2HU) + (i _ X)e 2h{v) . 

Taking logarithms of both sides, (f3]) follows from the concavity 
of the logarithm. Conversely, taking exponentials, (01 written 
for U. V implies ([]} for A chosen so that U and V have 
equal entropies: exp2/i({7) = exp2h(V), that is, ggEjM_J _ 

°*P™P or A - e 2HX))/( e 2h(X) + e 2h(Y)y 

The first rigorous proof of the EPI was given by Stam [6] 
(see also Blachman [7]). It is based on the properties of 
Fisher's information (FI) 

the link between differential entropy and FI being de Bruijn's 
identity [8, Thm. 17.7.2]: 

j t KX + VI Z) = \j{X + ViZ), (6) 

where Z ~ A/"(0, 1) is a standard Gaussian random vari- 
able, which is independent of X. Recently, Verdu, Guo and 
Shamai [9], [10] provided an alternative proof of the EPI based 
on the properties of the MMSE in estimating the input X to 
a Gaussian channel given the output Y = \JtX-\- Z, where t 
denotes the signal-to-noise ratio. This MMSE is achieved by 
the conditional mean estimator X(Y) — E(X\Y) and is given 
by the conditional variance 

Var(X|y) = E{(AT-E{X|y}) 2 }, (7) 

where the expectation is taken over the joint distribution of the 
random variables X and Y. The connection between input- 
output mutual information I(X;Y) = h(Y) — h(Z) and 
MMSE is made by the following identity derived in [11]: 

jI{X;VtX + Z) = hl^{X\^ftX + Z). (8) 

This identity turns out to be equivalent to de Bruijn's iden- 
tity ©. It has been claimed [10] that using the alternative 
MMSE representation in place of FI representation is more 
insightful and convenient for proving the EPI. 

In this paper, we show that it is possible to avoid both 
MMSE and FI representations and use only basic properties 



of mutual information. The new proof of the EPI presented 
in this paper is based on a convexity inequality for mutual 
information under the variance preserving transformation 

Theorem 1: If X and Y are independent random variables, 
and if Z is Gaussian independent of X, Y, then 

I(\f\X + VT~XY + ViZ;Z) 

< XI(X + y/tZ;Z) + (l- X)I(Y + V~tZ;Z) (9) 

for all < A < 1 and t > 0. 

Apart from its intrinsic interest, we show that inequality (0 
reduces to the EPI by letting t — ► oo. 

Before turning to the proof of Theorem [1] we make the 
connection between earlier proofs of the EPI via FI and via 
MMSE by focusing on the essential ingredients common to 
the proofs. This will give an idea of the level of difficulty that 
is required to understand the conventional approaches, while 
also serving as a guide to understand the new proof which uses 
similar ingredients, but is comparatively simpler and shorter. 

The remainder of the paper is organized as follows. Sec- 
tion [n] gives a direct proof of a simple relation between FI 
and MMSE, interprets (0 and (0 as dual consequences of 
a generalized identity, and explores the relationship between 
the two previous proofs of the EPI via FI and via MMSE. It 
is shown that these are essentially dual versions of the same 
proof; they follow the same lines of thought and each step has 
an equivalent formulation, and a similar interpretation, in terms 
of FI and MMSE. Section Hill then proves Theorem Q] and the 
EPI using two basic ingredients common to earlier approaches, 
namely 1) a "data processing" argument applied to 2) a 
Gaussian perturbation method. The reader may wish to skip 
to this section first, which does not use the results presented 
earlier. The new approach has the advantage of being very 
simple in that it relies only on the basic properties of mutual 
information. 

II. Proofs of the EPI via FI and MMSE revisited 

The central link between FI and MMSE takes the form of 
a simple relation which shows that they are complementary 
quantities in the case of a standard Gaussian perturbation Z 
independent of X: 



J(X + Z)+ War(X\X + Z) = 1 



(10) 



This identity was mentioned in [11] to show that (0 and (0 
are equivalent. We first provide a direct proof of this relation, 
and then use it to unify and simplify existing proofs of the EPI 
via FI and via MMSE. In particular, two essential ingredients, 
namely, Fisher's information inequality [7], [12], and a related 
inequality for MMSE [9], [10], will be shown to be equivalent 
from ([Tol l. 

A. A new proof of d 1 Ob 

Fisher's information (0 can be written in the form 

J(X) = E{S 2 (X)} = Var{S(X)} (11) 



where S(X) = p'(X)/p(X) is a zero-mean random variable. 
The following conditional mean representation is due to Blach- 
man [7]: 

S(X + Z) = £{S{Z)\X + Z}. (12) 
By the "law of total variance", this gives 



J(X + Z)= Var{S(X + Z)} 

= Var{E{S(Z)\X + Z}} 
= Var{S(Z)}- Var{S(Z)|Jf 
= J(Z) - Var{S{Z)\X + Z} 



Z} 



(13) 



We now use the fact that Z is standard Gaussian. It is easily 
seen by direct calculation that S(Z) = —Z and J(Z) = 1, 
and, therefore, J(X + Z) = 1 - Var{Z\X + Z}. Since Z - 
E(Z\X + Z) = £{X\X + Z) - X we have \/ar{Z\X + Z} = 
Var{X\X + Z}, thereby showing GO). ■ 
Note that when Z is Gaussian but not standard Gaussian, we 
have S(Z) = -Z/Var(Z) and J(Z) = l/Var(Z), and (Qjj]) 
generalizes to 



Var(Z) J(X + Z) + J{Z)Var(X\X + Z) = 1. 



(14) 



Another proof, which is based on a data processing argument 
and avoids Blachman's representation, is given in [13]. 

B. Intepretation 

Equation ( fTOb provides a new estimation theoretic interpre- 
tation of Fisher's information of a noisy version X' = X + Z 
of X. It is just the complementary quantity to the MMSE that 
results from estimating X from X' . The estimation is all the 
more better as the MMSE is lower, that is, as X' provides 
higher FI. Thus Fisher's information is a measure of least 
squares estimation's efficiency, when estimation is made in 
additive Gaussian noise. 

To illustrate, consider the special case of a Gaussian random 
variable X. Then the best estimator is the linear regression 
estimator, with MMSE equal to V ar(X|A" / ) = (1 -p 2 )Var(X) 
where p = ^/Var(AT)/Var(AT') is the correlation factor be- 
tween X and X'\ 

Max{X) 



Var(X|X') 



Var(X) + 1 ' 



(15) 



Meanwhile, J(X') is simply the reciprocal of the variance 
of X': 

J ^ - VaTpoTT (16) 

Both quantities sum to one, in accordance with ( fTOb . In the 
case of non-Gaussian noise, we have the more general iden- 
tity ( fT3l which also links Fisher information and conditional 
variance, albeit in a more complicated form. 

C. Dual versions of de Bruijn 's Identity 

De Bruijn's identity can be stated in the form [5] 



U{X- 
dt K 



ViZ) 



t=o 



\j(X)\Jar(Z). 



(17) 



The conventional technical proof of (fTTT i is obtained by in- 
tegrating by parts using a diffusion equation satisfied by the 



Gaussian distribution (see e.g., [7] and [8, Thm. 17.7.2]). A 
simpler proof of a more general result is included in [13]. 
From ( TP7I ). we deduce the following. 

Theorem 2 (de Bruijn's identity): For any two random in- 
dependent random variables X and Z, 

^h{X + VtZ) = i J(X + VtZ)yar(Z) (18) 

if Z is Gaussian, and 

^h(X + V~tZ) = ^J(X)Var(Z\X + V~tZ) (19) 

if X is Gaussian. 

This theorem is essentially contained in [11]. In fact, noting 
that I{X;y/iX + Z) = h(VtX + Z) - h(Z), it is easily 
seen that JT9l , with X and Z interchanged, is the identity © 
of Guo, Verdu and Shamai. Written in the form (fT9l it is 
clear that this is a dual version of the conventional de Bruijn's 
identity ( fT8l . Note that both identities reduce to dTTb for t = 0. 
For completeness we include a simple proof for t > 0. 

Proof: Equation ( fT8l easily follows from ( TP7I ) using the 
stability property of Gaussian distributions under convolution: 
substitute X + Vt 7 Z' for X in ([T7J, where Z and Z' are 
taken to be iid Gaussian random variables, and use the fact 
that ViZ + \ft/ Z' and y/t + t' Z are identically distributed. 

To prove (fT9b we use the complementary relation (TT4T > in 
the form 

VarpT) J(X + Vi Z) + tJ{X)\lar{Z\X + y/tZ) = l (20) 

where X is Gaussian. Let u = 1/1 By dl~8b (with X and Z 
interchanged), we have 

jh{X + VtZ) = j t {h(V^X + Z) + ^ log*} 

= -^Var(X)J(V^X + Z) + l 

= -h/ ar (x)j(x + Viz) + ^. 

which combined with < f20b proves ( fT9l . ■ 

D. Equivalent integral representations of differential entropy 

Consider any random variable with density and finite vari- 
ance a 2 = Var(X). Its non-Gaussianness is defined as the 
divergence with respect to a Gaussian random variable X G 
with identical second centered moments, and is given by 

D h {X) = h{X G ) - h(X) (21) 

where h(Xc) = \ log(27re(7 2 ). Let Z be standard Gaussian, 
independent of X. From ( TT8l . we obtain 

^D h (X + VtZ) = ~Dj{X + ViZ), (22) 

where 

Dj{X) = J{X) - J(X G ). (23) 



Here J(Xq) = 1/er 2 and d23l is nonnegative by the Cramer- 
Rao inequality. Now for t = 0, D h (X + ViZ) = D h (X), and 
since non-Gaussianness is scale invariant, Dh{X + y/tZ) = 
D h (Z + X/y/t) -> D h (Z) = as f -> +oo. Therefore, 
integrating d22l) from i = to +oo we obtain a FI integral 
representation for differential entropy [4] 

i r°° 

D h (X) = - / Dj(X + Vt Z) dt (24) 
* Jo 

or 

1 1 f°° 1 

= - log(2^ea 2 ) - - / J(X + ViZ) - dt. 

(25) 

Similarly, from ( fT9] > with X and Z interchanged, we obtain a 
dual identity: 

^D /t (^X + Z) = i J D v (X|^X + Z), (26) 

where for Y = Vi X + Z and Y G = Vi X G + Z, 

D V (X\Y) = \/ar(X G \Y G ) - Vw(X\Y). (27) 

Again this quantity is nonnegative, because \/ar(X G \Y G ) = 
<7 2 /(t<r 2 + 1) is the MMSE achievable by a linear esti- 
mator, which is suboptimal for non-Gaussian X. Now for 
t = 0, D h (V~tX + Z) = D h (Z) vanishes, and since non- 
Gaussianness is scale invariant, Dh(VtX + Z) = Dh{X + 
Z/Vi) ^ Dh{X) as t — > +oo. Therefore, integrating (|261 > 
from t = to +oo we readily obtain the MMSE integral 
representation for differential entropy [11]: 

1 f°° 

Dh{X) = - / D v {X\VtX + Z) dt (28) 

2 Jo 

or 

1 1 poo 2 

= -log(27re(T 2 )-- / — _^-Var(X|Vt X+Z) dt. 

(29) 

This MMSE representation was first derived in [11] and used 
in [9], [10] to prove the EPI. The FI representation ( fZSl ) can 
be used similarly, yielding essentially Stam's proof of the 
EPI [6], [7]. These proofs are sketched below. 

Note that the equivalence between FI and MMSE rep- 
resentations d24l i. (f28b is immediate by the complementary 
relation ([Toi l, which can be simply written as 

Dj(X + Z) = D V (X\X + Z). (30) 

In fact, it is easy to see that d24b and d28l ) prove each 
other by ( f30b after making a change of variable u = l/t. 
Both sides of (l30l are of course simultaneously nonnegative 
and measure a "non-Gaussianness" of X when estimated in 
additive Gaussian noise. Interestingly, these FI and MMSE 
non-Gaussianities coincide. 

It is also useful to note that in the above derivations, the 
Gaussian random variable X G may very well be chosen such 
that a 2 — \lax{X G ) is not equal to Var(JC). Formulas d2TT>- 
(l30l still hold, even though quantities such as Dh, Dj, and 
Dy may take negative values. In particular, the right-hand 
sides of (l25l l and ( l29t do not depend on the particular choice 

Of (7 . 



E. Simplified proofs of the EPI via FI and via MMSE 

For Gaussian random variables Xq, Yq with the same 
variance, the EPI in the form ^ holds trivially with equality. 
Therefore, to prove the EPI, it is sufficient to show the 
following convexity property: 

D h (W)<XD h {X) + (l-X)D h (Y) (31) 

where -Dft(-) is defined by (f2TT > and W is defined as in (|4j. 
By the last remark of the preceding subsection, the FI and 
MMSE representations d241l . d28l hold. Therefore, to prove the 
EPI, it is sufficient to show that either one of the following 
inequalities holds. 

Dj{W + VtZ) 

< XDj(X + VtZ) + (1 - X)Dj(Y + VtZ) 
D v {W\VtW + Z) 

< XD v (X\VtX + Z) + (1- X)D v (Y\VtY + Z). 
These in turn can be written as 

j(w + Vtz) 

<XJ(X + VtZ) + (l-X)J{Y + VtZ) (32) 

Var(W\VtW + Z) 

> XVar(X\ViX + Z) + (1 - X)\Zar(Y\V~tY + Z), 

(33) 

because these inequalities hold trivially with equality for Xq 
and Yq. 

Inequality d33l is easily proved in the form 

\lzr{W\ViW + Z) > \/ar{W\ViX + Z',VtY + Z") (34) 

where Z' and Z" are standard Gaussian random variables, 
independent of X, Y and of each other, and Z = V~X Z' + 
\/l — A Z" . This has a simple interpretation [9], [10]: it is 
better to estimate the sum of independent random variables 
from individual noisy measurements than from the sum of 
these measurements. 

The dual inequality ( 1321 can also be proved in the form 

J(W) < XJ(X) + (1 - X)J(Y) (35) 

where we have substituted X, Y for X + V~t Z' , Y + Vt Z". 
This inequality is known as the Fisher's information inequal- 
ity [5], and an equivalent formulation is 



J(X + Y) ~ J{X) J(Y)' 

Blachman [7] gave a short proof using representation ( fT2l and 
Zamir [12] gave an insightful proof using a data processing 
argument. Again d36l l has a simple interpretation, which is 
very similar to that of d34"l i. Here 1/J(X) is the Cramer-Rao 
lower bound (CRB) of the mean-squared error of the unbiaised 
estimator X for a translation parameter, and d36l > states that 
in terms of the CRB it is better to estimate the translation 
parameter corresponding to the sum of independent random 
variables X + Y from individual measurements than from the 
sum of these measurements. 



At any rate, the equivalence between d32| ) and ( T33T > is im- 
mediate by the complementary relation ( TTOb or its generalized 
version ( TBI ), as can be easily seen. Either one of ( l32l . ( l33l ) 
gives a proof of the EPI. ■ 

The above derivations illuminate intimate connections be- 
tween both proofs of the EPI, via FI and via MMSE. They 
do not only follow the same lines of argumentation, but are 
also shown to be dual in the sense that through ( [Tol l, each 
step in these proofs has an equivalent formulation and similar 
interpretation in terms of FI or MMSE. 

III. A NEW PROOF OF THE EPI 

For convenience in the following proof, define W by (01 

and let a = VtX and = ^(1 - A). 

A. Proof of Theorem Q] 

Similarly as in the conventional proofs of the EPI, we 
use the fact that the linear combination VX(X + aZ) + 
VT^X{Y + /3Z) = W + VtZ cannot bring more information 
than the individual variables X + aZ and Y + (3Z together. 
Thus, by the data processing inequality for mutual information, 

I(W + VtZ;Z) < I{X + aZ,Y +f3Z;Z). (37) 

Let U = X + aZ, V = Y + j3Z and develop using the chain 
rule for mutual information: 

I(U,V;Z) = I{U;Z) + I(V;Z\U) 

<I(U;Z) + I(V;Z\U) + I(U;V) 

= I{U;Z) + I(V;U,Z) 

= I(U;Z) + I(V;Z) + I(U;V\Z). 

Since X and Y are independent, U and V are conditionally 
independent given Z, and therefore, I(U; V\Z) = 0. Thus, we 
obtain the inequality 

I(W + VtZ;Z)< I{X + aZ; Z) + I(Y + f3Z; Z) (38) 

Assume for the moment that I(X + aZ; Z) admits a second- 
order Taylor expansion about a = as t — > 0. Since I(X + 
aZ; Z) vanishes for a — 0, and since mutual information is 
nonnegative, we may write 

I(X + aZ; Z) = XI(X + V~tZ:Z) + o(a 2 ) 

where o(a 2 ) — o{t) and tends to zero as t — > 0. Similarly 
I(Y + [3Z;Z) = (1- X)I(Y + V~tZ:Z) + o{t). It follows that 
in the vicinity of t = 0, 

I(W + VtZ;Z)< XI(X + VtZ;Z) 

+ (1- X)I{Y + VtZ;Z)+o{t). (39) 

We now remove the o(t) term by using the assumption that 
Z is Gaussian. Consider the variables X' = X + Vt' Z[, 
Y' =Y + Vt' Z' 2 , where Z[, Z' 2 are identically distributed as 
Z but independent of all other random variables. This Gaussian 
perturbation ensures that densities are smooth, so that I(X' + 
aZ; Z) and I(Y' + fiZ: Z) both admit a second-order Taylor 



expansion about t = 0. We may, therefore, apply ( f39l > to X' 
and y', which gives 

I{W + VP Z' + \TtZ-Z) < XI(X + W Z[ +VtZ;Z) 
+ {1-X)I(Y + VP ' Z' 2 + VtZ-Z) + o{t) (40) 

where Z' = VA Z{ + \J 1 — A Z' 2 is identically distributed as Z. 
Applying the obvious identity I(X + Z' + Z; Z) = I(X + Z' + 
Z; Z' + Z)- I(X + Z'; Z') and using the stability property of 
the Gaussian distribution under convolution, ( f40b boils down 
to 

f(t' +t)< f{t') + o(t) 

where we have noted /(*) = I(W + sftZ;Z) - XI(X + 
ViZ;Z)-(l- X)I(Y + VtZ; Z). It follows that f(t) is non 
increasing in t, and since it clearly vanishes for t = 0, it always 
assumes non positive values for all t > 0. This completes the 
proof of (|9]). ■ 

Interestingly, this proof uses two basic ingredients common 
to earlier proofs presented in section [II] 1) the fact that two 
variables together bring more information than their sum, 
which is here expressed as a data processing inequality for 
mutual information; 2) a Gaussian perturbation argument using 
an auxiliary variable Z. 

In fact, Theorem Q] could also be proved using either one 
of the integral representations (|25j, (l29b . which are equivalent 
by virtue of ( TTOb and obtained through de Bruijn's identity as 
explained in section [TXI The originality in the present proof is 
that it does neither require de Bruijn's identity nor the notions 
of FI or MMSE. 

B. The discrete case 

The above proof of Theorem[T]does not require that X or Y 
be random variables with densities. Therefore, Theorem[T]also 
holds for discrete (finitely or countably) real valued random 
variables. Verdu and Guo [9] proved that the EPI, in the 
form ([3), also holds in this case, where differential entropies 
are replaced by entropies. We call attention that this is in fact 
a trivial consequence of the stronger inequality 

H(y/XX + Vl- AY) > mzx(H(X),H(Y)) (41) 

for any two independent discrete random variables X and Y. 
This inequality is easily obtained by noting that 

H{W) > H(W\Y) = H(X\Y) = H(X) (42) 

and similarly for H(Y). 

C. Proof of the EPI 

We now show that the EPI for differential entropies, in the 
form ([3), follows from Theorem Q] By the identity I(X + 
Z: Z) + h(X) = I(X; X + Z) + h(Z), inequality © can be 
written in the form 

h(W) - Xh(X) - (1 - X)h(Y) > I(W; W + VtZ) 

-XI(X;X + VtZ)-(l-X)I(Y;Y + VtZ). (43) 



We now let t — ► oo in the right-hand side of this inequality. Let 
e = l/yi and Xq be a Gaussian random variable independent 
of Z, with identical second moments as X. Then I(X; X + 
ViZ) = I(X;eX+Z) = h(eX + Z)-h(Z) < h{eX G + Z)- 
h(Z) — i log(l + a\/tcr z ), which tends to zero as t — > oo. 
This holds similarly for the other terms in the right-hand side 
of d43j. Therefore, the EPI © follows. ■ 

In light of this proof, we see that theoremQ] which contains 
the EPI as the special case where a 2 z — > oo, merely states that 
the difference h(W + Z) - Xh(X + Z) - (1 - X)h(Y + Z) 
between both sides of the EPI ([3J decreases as independent 
Gaussian noise Z is added. This holds in accordance with the 
fact that this difference is zero for Gaussian random variables 
with identical variances. 

One may wonder if mutual informations in the form 
I(X; y/iX + Z) rather than I(X + VtZ;Z) could be used in 
the above derivation of Theorem Q] and the EPI, in a similar 
manner as Verdu and Guo's proof uses d29b rather than d25b . 
In fact, this would amount to proving (03J, whose natural 
derivation using the data processing inequality for mutual 
information is through (f37j- 

The same proof we used above can be employed verbatim to 
prove the EPI for random vectors. Generalizations to various 
extended versions of EPI are provided in a follow-up to this 
work [13]. 
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